--- title: first snkrfinder.model.cvae keywords: fastai sidebar: home_sidebar nb_path: "nbs/02c_model.cvae.ipynb" ---
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))
Load the saved merged database, and set the seeds. And doublecheck our data is where we expect.
df = pd.read_pickle(f"data/{COMBINED_DF}.pkl")
np.random.seed(3333)
torch.manual_seed(3333)
image_path = L_ROOT/"data"
#image_path = D_ROOT/DBS['zappos']
batch_size = 64
L([image_path/d for d in df.path])
df = prep_df_for_datablocks(df)
Don't forget to set n_inp=1. Otherwise the default to make the input to 1-len(blocks). Also note that the FeatsResize is used to avoid the random jittering from resize during training. Only the very narrow batch augmentations will be used.
# block.summary(df)
Variational Auto-Encoder for fastai
I'm going to use a generic convolutional net as the basis of the encoder, and its reverse as the decoder. This is a proof of concept for using the fastai framework, and will experiment with pre-trained resnet and MobileNet_v2 later. I'd like to use the MobileNet_v2 as a direct link ot the "SneakerFinder" tool which motivated this experiment. [see SneakerFinder]
A variational "module" will sit between the encoder and decoder as the "Bottleneck". The Bottleneck will map the resnet features into a latent space (e.g. ~100 dimensions) represented of standard normal variables. The "reparameterization trick" will sample from this space and the "decoder" will generate images.
Finally a simple "decoder" will sample from the variational latents space and be trained to reconstruct the images.
The intention is the latent space can be used to generate novel sneaker images.
Although we give up the original utility we are going for -- creating new sneakers via the latent space -- having otherwise equivalent non-variational autoencoders for reference will be great. Furthermore, this latent space representation will be amenable to a MMD regularization later on. This will be useful to avoid some of the limitiations of the KLD as a regularizer (overestimation of variance, and some degenerate convergences). Its sort of hack-ey but keeping the tooling equivalent to the betaVAE will ultimatel give some advantages.
It is convenient to avoid the class wrappers to simplify the param splitting for freezing the pre-trained arcitecture.
We could enumerate the class layers and return sequential, but simply making some functions to put the layers togeter is better.
The AnnealedLoss Callback basically injects a kl_weight parameter into the loss so we can start training without the full KLD regularization for the beta-VAE version.
The fastai Learner class does the training loop. It took me a little digging into the code to figure out how Metrics are called since its not really stated anywhere in the documentation (Note: create PR for fastai for extra documentation on Metrics logic). By default one of the key Callbacks is the Recorder. It prints out the training summary at each epoch (via ProgressCallBack) and collects all the Metrics. Which by default only loss is a train_met and others are valid_met.
The Recorder resets (maps reset() to all mets) the metrics before_train and before_valid. The Recorder maps accumulate() to the metrics on after_batch. Finally
AnnealedLossCallback will inject the latent mu and logvar and a kl_weight variable into our loss. The mu and logvar will be used to compute the KLD. The kl_weight is a scheduled weighting for the KLD. You can see the schedule graph of the parameter. At the beginning it will be 0, thus the KLD part of the loss will get ignored. So during 10% of training, we will fit a normal auto-encoder. Then gradually for 30% of trainning, increase kl_weight to 1 and then remain there for the remaining training time so that the auto encoder now becomes full variational. The way this callback is done, the loss will receive this parameter, but not the model.
n_epochs = 10
f_init = combine_scheds([.1, .7, .2], [SchedNo(0,0),SchedCos(0,1), SchedNo(1,1)])
# f = combine_scheds([.8, .2], [SchedCos(0,0), SchedCos(0,.5)])
p = torch.linspace(0.,1.,100)
pp = torch.linspace(0.,1.*n_epochs,100)
plt.plot(pp,[f_init(o) for o in p])
WARNING: Avoid using early stopping because the AnnealedLossCallback will make the loss grow once the KL divergence weight kicks in.
I want to note something here that was a little confusing to me: params(model) is a builtin fastai PyTorch.core function which returns all of the parameters of the modules. i.e.
def params(m):
"Return all parameters of `m`"
return [p for p in m.parameters()]
The toplevel fastai core functions with simple names that almost match class attributes was one of my biggest stumbling blocks in getting acquainted with the fastai v2 API. (The other is the documentation which is autogenerated by the fastdev frameworks from their development noteboooks. More on that struggle and my tips if that is troblesome for you later (here).
{% include note.html content='that it is crucial that you don’t freeze the batch norm layers. The bn_splitter collects out all the batchnorm layers. The simple splitting we do only freezes the encoder and leaves the latent layers (i.e. VAE or linear encoding bottlenedck) and the decoder in a parameter group with the batchnorm layers.' %}
Splitters {% include warning.html content='there are two completely different splitters in the FastAI API. This splitter groups the model parameters into groups for freezing and for progressive learning rates. (The other one is splits data into train and validate. I got imminiently confused when I first started with the API by this.' %}
MobileNet_v2 as the encoder, as a continuation of the original Sneaker Finder
simple bowtie convolutional encoder / decoder (Mimics the GOAT medium blog)
- Architecture Hyperparameters:- Latent Size (research default 256, production default 32) - Filter Factor Size (research default 16, production default 32)
- Latent Linear Hidden Layer Size (research default 2048, production default 1024)
- The encoder architecture is as follows with research defaults from above:
- Input 3x128x128 (conv2d block [conv2d, batchnorm2d, relu])
- 16x64x64 (conv2d block [conv2d, batchnorm2d, relu])
- 32x32x32 (conv2d block [conv2d, batchnorm2d, relu])
- 64x16x16 (conv2d block [conv2d, batchnorm2d, relu])
- 128x8x8 (conv2d block [conv2d, batchnorm2d, relu])
- Flatten to 8192
- 2048 (linear block [linear, batchnorm1d, relu])
- Split the 2048 dimension into mu and log variance for the parameters of the latent distribution
- Latent mu size 256 (linear layer only with bias)
- Latent logvar size 256 (linear layer only with bias)
- In the middle here you can break out the BCE and KLD loss for the final loss term and use the standard reparam trick to sample from the latent distribution.
- Decoder architecture an exact mirror
- Input 256
- 2048 (linear block [linear, relu])
- 8192 (linear block [linear, batchnorm1d, relu])
- reshape (128x8x8)
- 64x16x16 (conv2d transpose block [convtranspose2d, batchnorm2d, relu])
- 32x32x32 (conv2d transpose block [convtranspose2d, batchnorm2d, relu])
- 16x64x64 (conv2d transpose block [convtranspose2d, batchnorm2d, relu])
- 3x128x128 (conv2d transpose [convtranspose2d, sigmoid]
- For weight initialization I used a normal distribution centered at zero with 0.02 set for the stddev. Optimizer: Adam with default parameters, if I were to do it over again I'd spend more time here understanding the learning dynamics. The dataset was about ~10,000 with a 70/20/10 split, batch size 64, over 120 epochs, with a learning schedule to reduce when the loss started to plateau. No crazy image augmentation just resizing and standards flips. I used the ANN package Annoy to do the NN search for prod, normalizing the embeddings and using the cosine similarity, ANN factor was 128 for num_trees.
- MMD regularized VAE where the latents are drawn from a
TODO: Ranger optimizer might really help .. test
We can also use the transfer learning VAE tooling we previously built. We just need to create the convolutional encoder and pass it in... Note that we don't have a pre-trained option, so DON'T FREEZE!
Now just wrap that simple conv block architecture into a builder. And a meta-wrapper to let us call the conv_encoder and pre-trained options with the same function. (I'll also put the get_pretrained_parts function here now even though we won't use it till the next section, so that we can make the get_encoder_parts generic wrapper handle both properly.)
latent_dim = 128
# equalize KLDiv wrt errors per pixel
alpha = 3*IMG_SIZE*IMG_SIZE/latent_dim
alpha /= 20 # 5% retularizer
batchmean = True
useL1 = False
hidden_dim = None
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_AE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 64
dls = block.dataloaders(df, batch_size=batch_size)
arch='vanilla'
vae = AE(get_encoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = AELoss(batchmean=batchmean,alpha=alpha,useL1=useL1)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split) #.to_fp16() #wd=config['wd'],opt_func=ranger,
{% include note.html content='The to_fp16() callbacks work but increasing the batch size doesn’t really speed things up.' %}
lr1,lr2=learn.lr_find()
mlr, gmlr = .5*(lr1+lr2), torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
n_epoch = 100
learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
# n_epoch = 10
# learn.fit_one_cycle(n_epoch,lr_max=lr1) #, lr_max= base_lr)
# learn.show_results()
learn.show_results()
prefix = f"AE-{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
# x_hat,latents = learn.model.cuda()(x)
# dummy_var1 = (z.var(dim=1).unsqueeze(-1).expand(z.size()) )
# dummy_var2 = (z.var(dim=0).unsqueeze(0).expand(z.size()) )
# dummy_var1[:10,0],dummy_var2[:10,0]
For several of the decoder and "sampler" layers I might want to turn off the nonlinearity to give us more reasonable "gaussian" outputs to the Variational layer and the generated image which will is compared with the ImageNetStats batch-normalized image.
IMPORTANT VAE TIP!!! Make sure NOT to use batch normalization and non-linearity in the linear layers of the VAE. The normalization will affect the representation and the KLD constraints.
once we have this we can do three things:
1. develop proper VAE loss functions (including KL Divergence constraint on latent variables)
2. create callbacks (and custom learner?) for training
3. extend to a beta-variational framework with aims at creating "disentangled" latent dimensions
Putting it all together gives us our VAE! Note that we'll pass in the "parts" of the encoder for ease of using pretrained (or not) architectures. The model name will correspond to the architecture of the encoder via name.
Note that the BVAE can simply inherit from the AE class we defined above. Really the only difference in the __init__ function is that a VAELayer which performs the reparameterization trick replaces the AElayer as self.bn
### TODO: refactor the BVAE and AE to a single architecture... with a "sample" function ot
class BVAE(AE):
"""
simple VAE made with an encoder passed in, and some builder function for the Latent (VAE reparam trick) and decoder
"""
def __init__(self,enc_parts,hidden_dim=None, latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE):
"""
inputs:
enc_arch (pre-cut / pretrained)
enc_dim
latent_dim
hidden_dim
im_size,out_range
"""
enc_arch,enc_feats,name = enc_parts
# encoder
# arch,cut = xresnet18(pretrained=True),-4
# enc_arch = list(arch.children())[:cut]
BASE = im_size//2**5
enc_dim = enc_feats * BASE**2 # 2**(3*3) * (im_size//32)**2 #(output of resneet) #12800
self.encoder = build_AE_encoder(enc_arch,enc_dim=enc_dim, hidden_dim=hidden_dim, im_size=im_size)
in_dim = enc_dim if hidden_dim is None else hidden_dim
# VAE Bottleneck
self.bn = VAELayer(in_dim,latent_dim)
#decoder
self.decoder = build_AE_decoder(hidden_dim=hidden_dim, latent_dim=latent_dim, im_size=im_size,out_range=out_range)
store_attr('name,enc_dim, in_dim,hidden_dim,latent_dim,im_size,out_range') # do i need all these?
# def decode(self, z):
# return self.decoder(z)
# def encode(self, x):
# h = self.encoder(x)
# z, mu, logvar = self.bn(h) # reparam happens in the VAE layer
# return z, mu, logvar
# def forward(self, x):
# #z, mu, logvar = self.encode(x)
# # h = self.encoder(x)
# # z, mu, logvar = self.bn(h) # reparam happens in the VAE layer
# # x_hat = self.decoder(z)
# z,mu,logvar = self.encode(x)
# x_hat = self.decode(z)
# latents = torch.stack([mu,logvar],dim=-1)
# return x_hat, latents # assume dims are [batch,latent_dim,concat_dim]
# # AE
# def decode(self, z):
# return self.decoder(z)
# def encode(self, x):
# h = self.encoder(x)
# return self.bn(h)
# def forward(self, x):
# """
# pass the "latents" out to keep the learn mechanics consistent...
# """
# h = self.encoder(x)
# z,logvar = self.bn(h)
# x_hat = self.decoder(z)
# latents = torch.stack([z,logvar] ,dim=-1)
# return x_hat , latents
A nice wrapper for building the encoder parts will be handy.
Sweet, we've verified the arcitecture works, but we need to train it with a loss that constrains the variational layers with the KL Divergence. Otherwise the simple MSE will diverge.
We have acouple examples to follow:
1. TabularData Vae (fastai v2 patterning) (@EtienneT)
pure PyTorch Vae which is directly related to our image dataset (@AntixK)
latent_dim = 128
# equalize KLDiv wrt errors per pixel
# alpha = 3*IMG_SIZE*IMG_SIZE/latent_dim
alpha = 5
batchmean = True
useL1 = False
hidden_dim = None
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# start with KLweight on completely to enable sensible lr_find results
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
# arch,cut = resnet18(pretrained=True),-4
# enc_arch = list(arch.children())[:cut]
# enc_dim = 512
# enc_parts = enc_arch,enc_dim,'resnet18'
# replaced by:
enc_parts = get_pretrained_parts(arch=resnet18)
rnet_vae = BVAE(enc_parts, hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = BVAELoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, rnet_vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
# learn = learn.to_fp16()
#learn.show_training_loop()
learn.freeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': default_KL_anneal_in()} ) )
# the defaults are pretty good for now
n_epochs = 20
learn.fit_one_cycle(5,lr_max= lr1)#, lr_max= base_lr)
#learn.fit_flat_cos(n_epochs, lr=lr1, pct_start=0.5)
#learn.fit_flat_cos(n_epochs)
learn.show_results()
The vae with pretrained resnet encoder seems to train to a much better end-point if we keep the resnet frozen. Hence the commented out learn.unfreeze() below.
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': SchedNo(1.,1.) }) )
#learn.unfreeze()
learn.freeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
base_lr = 1e-5# gmlr #/= 2
epochs = 20
#learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div)
#learn.fit_one_cycle(epochs, lr_max= 1e-3)
learn.fit_flat_cos(epochs,pct_start=.05)
learn.show_results()
prefix = f"Vae-{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
latent_dim = 128
# equalize KLDiv wrt errors per pixel
# alpha = 3*IMG_SIZE*IMG_SIZE/latent_dim
alpha = 5
batchmean = True
useL1 = False
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': default_KL_anneal_in() })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='vanilla'
vae = BVAE(get_encoder_parts(arch), hidden_dim=None,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = BVAELoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
n_epoch = 40
learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
#learn.fit_one_cycle(n_epoch) #, lr_max= base_lr)
learn.show_results()
# n_epoch = 40
# learn.fit_flat_cos(n_epoch, lr=1e-3, div_final=1e6, pct_start=0.3)
# learn.show_results()
learn.remove_cb(learn.cbs[-1])
# add new scheduler
learn.add_cb(ParamScheduler({'kl_weight': SchedNo(1.,1.) }) )
x=2
n_epoch = 100
#learn.fit_flat_cos(n_epoch, lr=1e-4, div_final=1e5, pct_start=0.4)
learn.fit_flat_cos(n_epoch)
#learn.fit_one_cycle(n_epoch) #, lr_max= base_lr)
learn.show_results()
prefix = f"Vae-{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
n_epoch = 100
learn.fit_flat_cos(n_epoch, lr=5e-5, pct_start=0.5)
#learn.fit_flat_cos(n_epoch)
#learn.fit_one_cycle(n_epoch) #, lr_max= base_lr)
learn.show_results()
prefix = f"AE-{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"{prefix}-2-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
# h = vae.enc.cuda()(x)
# z,mu,logvar = vae.cuda().reparam(h)
# x_hat = vae.dec.cuda()(z)
# x_hat2,latents = vae.cuda()(x)
# h.shape,z.shape,x_hat.shape, x_hat2.shape,latents.shape
latent_dim = 128
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
alpha = 20
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='vanilla'
vae = MMDVAE(get_encoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=useL1)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split) #.to_fp16() #wd=config['wd'],opt_func=ranger,
# x,y,z = learn.dls.one_batch()
# x_hat,latents = learn.model(x)
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
# WARNING... the loss is way way way too high. to start with... so the numbers a bogus...
# 10-3 is a good place to start
n_epoch = 100
# learn.fit_flat_cos(n_epoch, lr=1e-2)
#learn.fit_flat_cos(n_epoch, lr=1e-3, div_final=1e6, pct_start=0.1)
# n_epoch = 10
learn.fit_one_cycle(n_epoch) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
latent_dim = 128
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
alpha = 30
batchmean = True
useL1 = False
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 64
dls = block.dataloaders(df, batch_size=batch_size)
arch='vanilla'
vae = MMDVAE(get_encoder_parts(arch), hidden_dim=2048,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split) #.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
# WARNING... the loss is way way way too high. to start with... so the numbers a bogus...
# 10-3 is a good place to start
n_epoch = 100
learn.fit_flat_cos(n_epoch)#, lr=lr1)
#learn.fit_flat_cos(n_epoch, lr=1e-3, div_final=1e6, pct_start=0.1)
#learn.fit_one_cycle(n_epoch) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
# # cut = model_meta[resnet50]['cut']
# # enc_arch = list(arch.children())[:cut]
# # nn.Sequential(*enc_arch).cuda()(x).shape
# arch=resnet18
# cut = model_meta[arch]['cut']
# arch = arch(pretrained=True)
# enc_arch = list(arch.children())[:cut]
# enc_dim = 512
# vae = MMDVAE(enc_arch,enc_dim=enc_dim, hidden_dim=2048,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE)
latent_dim = 128
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
alpha = 10
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch=resnet18
vae = MMDVAE(get_encoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=useL1)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split) #.to_fp16() #wd=config['wd'],opt_func=ranger,
learn.freeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
learn.unfreeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
learn.freeze()
n_epoch = 200
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch)#, lr=lr1, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch,lr_max=gmlr) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'TMP'}"
filename = f"frozen-{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
#learn.export(f'{filename}.pkl')
n_epoch = 5
learn.unfreeze()
learn.fit_flat_cos(n_epoch, lr=lr1, div_final=1e6, pct_start=0.7)
#learn.fit_flat_cos(n_epoch, lr=1e-3, div_final=1e5, pct_start=0.5)
learn.fit_one_cycle(n_epoch) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"unfrozen-{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
latent_dim = 128
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
alpha = 20
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch=resnet18
vae = MMDVAE(get_encoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=useL1)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split) #.to_fp16() #wd=config['wd'],opt_func=ranger,
learn.freeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
n_epoch = 200
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
#learn.fit_flat_cos(n_epoch, lr=lr1, div_final=1e5, pct_start=0.5)
learn.fit_one_cycle(n_epoch) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"frozen-{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
learn.unfreeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
#n_epoch = 100
#learn.fit_flat_cos(n_epoch, lr=lr1, div_final=1e6, pct_start=0.05)
#learn.fit_flat_cos(n_epoch, lr=1e-3, div_final=1e5, pct_start=0.5)
learn.fit_one_cycle(5) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"unfrozen-{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
# # cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# # SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# # ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
# alpha = 8
# batchmean = False
# metrics = [MMDMetric(batchmean=batchmean,alpha=alpha),
# L1MeanMetric(),
# MuMetric(),
# StdMetric(),
# LogvarMetric(),
# L2MeanMetric(),
# WeightedKLDMetric(batchmean=batchmean,alpha=alpha),
# KLWeightMetric(),
# MuSDMetric(),
# LogvarSDMetric(),
# RawKLDMetric(batchmean=batchmean)
# ]
# batch_size = 64
# dls = block.dataloaders(df, batch_size=batch_size)
latent_dim = 128
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
alpha = 10
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch=resnet18
vae = MMDVAE(get_encoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=useL1)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split) #.to_fp16() #wd=config['wd'],opt_func=ranger,
learn.unfreeze()
n_epoch = 200
learn.fit_flat_cos(n_epoch, lr=1e-3, div_final=1e6, pct_start=0.1)
learn.show_results()
prefix = f"MMDVae-nofreeze{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
latent_dim = 128
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
alpha = 20
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch=resnet18
vae = MMDVAE(get_encoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=useL1)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split) #.to_fp16() #wd=config['wd'],opt_func=ranger,
learn.unfreeze()
n_epoch = 200
learn.fit_flat_cos(n_epoch, lr=1e-3, div_final=1e6, pct_start=0.1)
learn.show_results()
prefix = f"MMDVae-nofreeze{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
useL1 = False
def get_conv_parts2(enc_type='vanilla',im_size=IMG_SIZE):
"""
make a simple convolutional ladder encoder
TODO: make a switch on enc_type
- vanilla, resnet or vanilla-res
-TODO: change to 'convblock' and 'resblock'
"""
n_blocks = 5
BASE = im_size//2**5
nfs = [3]+[(2**i)*BASE for i in range(n_blocks)]
n = len(nfs)
modules = [ConvLayer(nfs[i],nfs[i+1],
ks=5,stride=2,padding=2) for i in range(n - 1)]
# modules = [ResBlock(1, nfs[i],nfs[i+1],
# stride=2, act_cls=Mish) for i in range(n - 1)]
return modules,nfs[-1],'vanilla'
def get_encoder_parts2(enc_type='vanilla',im_size=IMG_SIZE):
encoder_parts = get_conv_parts2(enc_type=enc_type,im_size=im_size) if isinstance(enc_type,str) else get_pretrained_parts(arch=enc_type)
return encoder_parts # returns enc_arch,enc_dim,arch.__name__
latent_dim = 128
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
alpha = 10
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resblock'
vae = ResBlockAE(get_resblockencoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
#learn.freeze()
n_epoch = 200
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch)#, lr=lr1, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch,lr_max=gmlr) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'TMP'}-latent{latent_dim}"
filename = f"frozen-{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
#learn.export(f'{filename}.pkl')
latent_dim = 128
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
alpha = 20
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resblock'
vae = ResBlockAE(get_resblockencoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
#learn.freeze()
n_epoch = 200
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch)#, lr=lr1, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch,lr_max=gmlr) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'TMP'}-latent{latent_dim}"
filename = f"frozen-{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
#learn.export(f'{filename}.pkl')
latent_dim = 64
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
alpha = 10
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resblock'
vae = ResBlockAE(get_resblockencoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
#learn.freeze()
n_epoch = 200
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch)#, lr=lr1, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch,lr_max=gmlr) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'TMP'}-latent{latent_dim}"
filename = f"frozen-{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
#learn.export(f'{filename}.pkl')
latent_dim = 64
# cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
# ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(),
ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
alpha = 10
# note that alpha needs to be adjusted to scale MMD regularizer compared to error for batchmean=true
#. e.g. *= 3*IMG_SIZE**2/latent_dim
batchmean = True
useL1 = False
hidden_dim = None
metrics = default_MMEVAE_metrics(alpha,batchmean,useL1)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resblock'
vae = ResBlockAE(get_resblockencoder_parts(arch), hidden_dim=hidden_dim,latent_dim=latent_dim, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = MMDLoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
#learn.freeze()
n_epoch = 200
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch)#, lr=lr1, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch,lr_max=gmlr) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'TMP'}-latent{latent_dim}"
filename = f"frozen-{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
#learn.export(f'{filename}.pkl')
class ResBlockBVAE(BVAE):
"""
simple VAE with a _probably_ pretrained encoder
"""
def __init__(self,enc_parts,hidden_dim=None, latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE):
"""
inputs:
enc_arch (pre-cut / pretrained)
enc_dim
latent_dim
hidden_dim
im_size,out_range
"""
enc_arch,enc_feats,name = enc_parts
# encoder
# arch,cut = xresnet18(pretrained=True),-4
# enc_arch = list(arch.children())[:cut]
BASE = im_size//2**5
enc_dim = enc_feats * BASE**2 # 2**(3*3) * (im_size//32)**2 #(output of resneet) #12800
self.encoder = build_AE_encoder(enc_arch,enc_dim=enc_dim, hidden_dim=hidden_dim, im_size=im_size)
in_dim = enc_dim if hidden_dim is None else hidden_dim
# VAE Bottleneck
self.bn = VAELayer(in_dim,latent_dim)
#decoder
self.decoder = build_ResBlockAE_decoder(hidden_dim=hidden_dim, latent_dim=latent_dim, im_size=im_size,out_range=out_range)
store_attr('name,enc_dim, in_dim,hidden_dim,latent_dim,im_size,out_range') # do i need all these?
# THESE ARE INHERITED..
# def decode(self, z):
# z = self.decoder(z)
# return z
# def reparam(self, h):
# return self.bn(h)
# def encode(self, x):
# h = self.encoder(x)
# z, mu, logvar = self.reparam(h)
# return z, mu, logvar
# def forward(self, x):
# z, mu, logvar = self.encode(x)
# x_hat = self.decode(z)
# latents = torch.stack([mu,logvar],dim=-1)
# return x_hat, latents # assume dims are [batch,latent_dim,concat_dim]
latent_dim = 128
alpha = 5 # doubled because latent is half?
batchmean = True
useL1 = False
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resnblock'
vae = ResBlockBVAE(get_resblockencoder_parts(arch), hidden_dim=2048,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = BVAELoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': default_KL_anneal_in()} ) )
# the defaults are pretty good for now
n_epochs = 10
#learn.fit_one_cycle(freeze_epochs1,lr_max= lr1)#, lr_max= base_lr)
#learn.fit_flat_cos(n_epochs, lr=lr1, pct_start=0.5)
#learn.fit_flat_cos(n_epochs) #, lr=1e-4,pct_start=0.5)
learn.fit_one_cycle(n_epochs)#, lr_max= base_lr)
learn.show_results()
This initial "burning in" of the KLD regularization is very unstable...
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': SchedNo(1.,1.) }) )
#learn.unfreeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
base_lr = 1e-5# gmlr #/= 2
epochs = 100
#learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div)
#learn.fit_one_cycle(epochs, lr_max= 1e-3)
#learn.fit_flat_cos(epochs,lr=lr1,pct_start=.05)
learn.fit_flat_cos(epochs,div_final=1000)#,lr=1e-4)
learn.show_results()
prefix = f"BVae-{'2step10_100'}-latent{latent_dim}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
x = 1
latent_dim = 128
alpha = 10 # doubled because latent is half?
batchmean = True
useL1 = False
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resnblock'
vae = ResBlockBVAE(get_resblockencoder_parts(arch), hidden_dim=2048,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = BVAELoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': default_KL_anneal_in()} ) )
# # the defaults are pretty good for now
n_epochs = 10
learn.fit_one_cycle(n_epochs)#, lr_max= base_lr)
# #learn.fit_flat_cos(n_epochs, lr=lr1, pct_start=0.5)
# learn.fit_flat_cos(n_epochs, lr=1e-4,pct_start=0.5)
# learn.show_results()
# # the defaults are pretty good for now
# n_epochs = 10
# learn.fit_one_cycle(10)#, lr_max= base_lr)
# #learn.fit_flat_cos(n_epochs, lr=lr1, pct_start=0.5)
# #learn.fit_flat_cos(n_epochs, lr=1e-4,pct_start=0.5)
# learn.show_results()
This initial "burning in" of the KLD regularization is very unstable...
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': SchedNo(1.,1.) }) )
#learn.unfreeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
epochs = 100
#learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div)
#learn.fit_one_cycle(epochs, lr_max= 1e-3)
#learn.fit_flat_cos(epochs,lr=lr1,pct_start=.05)
learn.fit_flat_cos(epochs,div_final = 1000)#,lr=1e-4)
learn.show_results()
learn.lr
# filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
# learn.save(filename)
# learn.export(f'{filename}.pkl')
# base_lr = 1e-5# gmlr #/= 2
# epochs = 50
# #learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div)
# #learn.fit_one_cycle(epochs, lr_max= 1e-3)
# #learn.fit_flat_cos(epochs,lr=lr1,pct_start=.05)
# learn.fit_flat_cos(epochs)
# learn.show_results()
# filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
# learn.save(filename)
# learn.export(f'{filename}.pkl')
# filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
# filename = "BVae-POST-1CYCLE10-latent128-resblock-alpha10_2021-03-24_21.33.02"
# learn.load(filename)
# #epochs = 5
# epochs = 10
# learn.fit_one_cycle(epochs, lr_max=.001)
# #learn.fit_flat_cos(epochs,lr=.0015,pct_start=.5,div_final=1000.0)
# #learn.fit_one_cycle(epochs,lr_max=5e-3,pct_start=0.5,div_final=100000) # gets down to ~4500 loss in 10
# learn.show_results()
prefix = f"BVae-{'2step10_100'}-latent{latent_dim}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
latent_dim = 64
alpha = 5 # doubled because latent is half?
batchmean = True
useL1 = False
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resnblock'
vae = ResBlockBVAE(get_resblockencoder_parts(arch), hidden_dim=2048,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = BVAELoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': default_KL_anneal_in()} ) )
n_epochs = 10
learn.fit_one_cycle(n_epochs)#,lr_max= lr1)#, lr_max= base_lr)
#learn.fit_flat_cos(n_epochs, lr=lr1, pct_start=0.5)
#learn.fit_flat_cos(n_epochs, lr=1e-4,pct_start=0.5)
learn.show_results()
This initial "burning in" of the KLD regularization is very unstable...
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': SchedNo(1.,1.) }) )
#learn.unfreeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
base_lr = 1e-5# gmlr #/= 2
epochs = 100
#learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div)
#learn.fit_one_cycle(epochs, lr_max= 1e-3)
#learn.fit_flat_cos(epochs,lr=lr1,pct_start=.05)
#learn.fit_flat_cos(epochs,lr=1e-4)
learn.fit_flat_cos(epochs, div_final=1000.0)#,lr=1e-3)
learn.show_results()
prefix = f"BVae-{'2step10_100'}-latent{latent_dim}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
latent_dim = 64
alpha = 10 # doubled because latent is half?
batchmean = True
useL1 = False
# SaveModelCallback(fname=datetime.now().strftime('%Y-%m-%d %Hh%M.%S'), every_epoch=True),
cbs = [AnnealedLossCallback(),TerminateOnNaNCallback(), ParamScheduler({'kl_weight': SchedNo(1.,1.) })]
metrics = default_VAE_metrics(alpha,batchmean,useL1)
block = get_ae_DataBlock(aug=True)
batch_size = 128
dls = block.dataloaders(df, batch_size=batch_size)
arch='resnblock'
vae = ResBlockBVAE(get_resblockencoder_parts(arch), hidden_dim=2048,latent_dim=128, im_size=IMG_SIZE,out_range=OUT_RANGE)
# let beta be calculated by : 3*im_size*im_size/latent_dim
loss_func = BVAELoss(batchmean=batchmean,alpha=alpha,useL1=False)
learn = Learner(dls, vae, cbs=cbs,loss_func=loss_func, metrics=metrics,splitter=AE_split)#.to_fp16() #wd=config['wd'],opt_func=ranger,
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': default_KL_anneal_in()} ) )
# the defaults are pretty good for now
n_epochs = 10
learn.fit_one_cycle(n_epochs)#,lr_max= lr1)#, lr_max= base_lr)
#learn.fit_flat_cos(n_epochs, lr=lr1, pct_start=0.5)
#learn.fit_flat_cos(n_epochs, lr=1e-4,pct_start=0.5)
learn.show_results()
This initial "burning in" of the KLD regularization is very unstable...
learn.remove_cb(learn.cbs[-1])
# add new constant scheduler
learn.add_cb(ParamScheduler({'kl_weight': SchedNo(1.,1.) }) )
#learn.unfreeze()
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
base_lr = 1e-5# gmlr #/= 2
epochs = 100
#learn.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div)
#learn.fit_one_cycle(epochs, lr_max= 1e-3)
#learn.fit_flat_cos(epochs,lr=lr1,pct_start=.05)
learn.fit_flat_cos(epochs,div_final=1000.)
learn.show_results()
prefix = f"BVae-{'2step10_100'}-latent{latent_dim}"
filename = f"{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
class LatentTuple(fastuple):
"Basic type for tuple of tensor (vectors)"
_show_args = dict(s=10, marker='.', c='r')
@classmethod
def create(cls, ts):
if isinstance(ts,tuple):
mu,logvar = ts
elif ts is None:
mu,logvar = None,None
else:
mu = None
logvar = None
if mu is None: mu = torch.empty(0)
elif not isinstance(mu, Tensor): Tensor(mu)
if logvar is None: logvar = torch.empty(0)
elif not isinstance(logvar,Tensor): Tensor(logvar)
return cls( (mu,logvar) )
def show(self, ctx=None, **kwargs):
mu,logvar = self
if not isinstance(mu, Tensor) or not isinstance(logvar,Tensor): return ctx
title_str = f"mu-> {mu.mean():e}, {mu.std():e} logvar->{logvar.mean():e}, {logvar.std():e}"
if 'figsize' in kwargs: del kwargs['figsize']
if 'title' in kwargs: kwargs['title']=title_str
if ctx is None:
_,axs = plt.subplots(1,2, figsize=(12,6))
x=torch.linspace(0,1,mu[0].shape[0])
axs[0].scatter(x, mu[:], **{**self._show_args, **kwargs})
axs[1].scatter(x, logvar[:], **{**self._show_args, **kwargs})
ctx = axs[1]
ctx.scatter(mu[:], logvar[:], **{**self._show_args, **kwargs})
return ctx
# could we do a typedispatch to manage the transforms...?
def VAETargetTupleBlock():
return TransformBlock(type_tfms=VAETargetTuple.create, batch_tfms=IntToFloatTensor)
def LatentTupleBlock():
return TransformBlock(type_tfms=LatentTuple.create, batch_tfms=noop)
# class TensorPoint(TensorBase):
# "Basic type for points in an image"
# _show_args = dict(s=10, marker='.', c='r')
# @classmethod
# def create(cls, t, img_size=None)->None:
# "Convert an array or a list of points `t` to a `Tensor`"
# return cls(tensor(t).view(-1, 2).float(), img_size=img_size)
# def show(self, ctx=None, **kwargs):
# if 'figsize' in kwargs: del kwargs['figsize']
# x = self.view(-1,2)
# ctx.scatter(x[:, 0], x[:, 1], **{**self._show_args, **kwargs})
# return ctx
latent_dim = 128
dropout = .2
im_size = IMG_SIZE
n_blocks = 5
nfs = [3] + [2**i*n_blocks for i in range(n_blocks+1)]
nfs.reverse()
# decoder = nn.Sequential(
# nn.Linear(latent_size, 16),
# UnFlatten(4),
# ResBlock(1, 3, 4, act_cls=Mish),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, 4, 8, act_cls=Mish),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, 8, 16, act_cls=Mish),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# nn.Conv2d(16, 1, 3),
# nn.Dropout2d(dropout),
# #nn.AdaptiveAvgPool2d((3,im_size, im_size))
# )
n_blocks = 5
hidden_dim = 2048
out_range = [-1,1]
tst = nn.Sequential(
nn.Linear(latent_dim,hidden_dim), #nn.Linear(latent_dim, 16)
nn.Linear(hidden_dim,im_size*n_blocks*n_blocks), #nn.Linear(latent_dim, 16)
ResizeBatch(im_size,n_blocks,n_blocks),#UnFlatten(n_blocks), #4
ResBlock(1, nfs[0], nfs[1], act_cls=Mish), #ResBlock(1, 1, 4, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[1], nfs[2], act_cls=Mish), #RResBlock(1, 4, 8, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[2], nfs[3], act_cls=Mish), #ResBlock(1, 8, 16, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[3], nfs[4], act_cls=Mish), #nn.Conv2d(16, 1, 3),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[4], nfs[5], act_cls=Mish), #nn.Conv2d(16, 1, 3),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[5], nfs[6], act_cls=Mish), #nn.Conv2d(16, 1, 3),
#nn.Dropout2d(dropout),
#nn.Upsample(scale_factor=2), #
#nn.AdaptiveAvgPool2d((3,im_size, im_size)),
SigmoidRange(*out_range), #nn.Sigmoid()
)
tst,nfs
inp= torch.randn((32,latent_dim))
#last_size = model_sizes(tst ) #[-1][1]
#num_features_model(tst)
#last_size
#nfs
tst(inp).shape,nfs
#last_size
z_dim = 100
enc = nn.Sequential(
ResBlock(1, 1, 16, act_cls=nn.ReLU, norm_type=None),
nn.MaxPool2d(2, 2),
ResBlock(1, 16, 4, act_cls=nn.ReLU, norm_type=None),
nn.MaxPool2d(2, 2),
Flatten()
)
# torch.Size([32, 1, 28, 28])
# torch.Size([32, 16, 28, 28])
# torch.Size([32, 16, 14, 14])
# torch.Size([32, 4, 14, 14])
# torch.Size([32, 4, 7, 7])
# torch.Size([32, 196])
latent_size = 100
enc = nn.Sequential(
ResBlock(1, 3, 5, stride=2, act_cls=Mish),# 1->3
ResBlock(1, 5, 5, stride=2, act_cls=Mish),
ResBlock(1, 5, 1, stride=2, act_cls=Mish),
Flatten(),
nn.Linear(400, latent_size) # 16->400
)
# torch.Size([32, 1, 28, 28])
# torch.Size([32, 5, 14, 14])
# torch.Size([32, 5, 7, 7])
# torch.Size([32, 1, 4, 4])
# torch.Size([32, 16])
# torch.Size([32, 4])
inp= torch.randn((32,3,160,160))
for ii in range(0,8):
print(enc[:ii](inp).shape)
z = enc(inp)
dropout=0
dec = nn.Sequential(
nn.Linear(latent_size, 16),
UnFlatten(4),
ResBlock(1, 1, 4, act_cls=Mish), #4->5
#nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, 4, 8, act_cls=Mish),
#nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, 8, 16, act_cls=Mish),
#nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
nn.Conv2d(16, 3, 3), #1->3
#nn.Dropout2d(dropout),
nn.AdaptiveAvgPool2d((28, 28)),
nn.Sigmoid()
)
# torch.Size([32, 4])
# torch.Size([32, 16])
# torch.Size([32, 1, 4, 4])
# torch.Size([32, 4, 4, 4])
# torch.Size([32, 4, 8, 8])
# torch.Size([32, 8, 8, 8])
# torch.Size([32, 8, 16, 16])
# torch.Size([32, 16, 16, 16])
# torch.Size([32, 16, 32, 32])
# torch.Size([32, 1, 30, 30])
# torch.Size([32, 1, 28, 28])
for ii in range(0,12):
print(dec[:ii](z).shape)
n_blocks = 5
BASE = im_size//2**5
nfs = [3]+[(2**i)*BASE for i in range(n_blocks)]
n = len(nfs)
hidden_dim = 2048
BASE = im_size//2**5
# encoder
in_dim = nfs[-1] * BASE**2
modules = [ResBlock(1, nfs[i],nfs[i+1],
stride=2, act_cls=Mish) for i in range(n - 1)]
# enc = nn.Sequential(
# ConvLayer(nfs[0],nfs[1],ks=5,stride=2,padding=2),
# ConvLayer(nfs[1],nfs[2],ks=5,stride=2,padding=2),
# ConvLayer(nfs[2],nfs[3],ks=5,stride=2,padding=2),
# ConvLayer(nfs[3],nfs[4],ks=5,stride=2,padding=2),
# ConvLayer(nfs[4],nfs[5],ks=5,stride=2,padding=2),
# Flatten(),
# LinBnDrop(in_dim,hidden_dim,bn=True,p=0.0,act=nn.ReLU(),lin_first=True)
# )
enc = nn.Sequential(*modules,
Flatten(),
LinBnDrop(in_dim,hidden_dim,bn=True,p=0.0,act=nn.ReLU(),lin_first=True)
)
nfs.reverse()
print(nfs)
#last_size = model_sizes(enc, size=(28,28))[-1][1]
encoder = nn.Sequential(enc, nn.Linear(hidden_dim, z_dim))
decoder = nn.Sequential(
nn.Linear(z_dim, im_size*n_blocks*n_blocks),
ResizeBatch(im_size,n_blocks,n_blocks),#UnFlatten(n_blocks), #4
ResBlock(1, nfs[0], nfs[1], ks=1, act_cls=nn.ReLU, norm_type=None),
#nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[1], nfs[2], act_cls=nn.ReLU, norm_type=None),
#nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
nn.Conv2d(nfs[2], 3, 3, padding=1),
#nn.Dropout2d(dropout),
nn.Sigmoid()
)
#last_size
# nn.Linear(latent_dim,hidden_dim), #nn.Linear(latent_dim, 16)
# nn.Linear(hidden_dim,im_size*n_blocks*n_blocks), #nn.Linear(latent_dim, 16)
# ResizeBatch(im_size,n_blocks,n_blocks),#UnFlatten(n_blocks), #4
# ResBlock(1, nfs[0], nfs[1], act_cls=Mish), #ResBlock(1, 1, 4, act_cls=Mish),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, nfs[1], nfs[2], act_cls=Mish), #RResBlock(1, 4, 8, act_cls=Mish),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, nfs[2], nfs[3], act_cls=Mish), #ResBlock(1, 8, 16, act_cls=Mish),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, nfs[3], nfs[4], act_cls=Mish), #nn.Conv2d(16, 1, 3),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, nfs[4], nfs[5], act_cls=Mish), #nn.Conv2d(16, 1, 3),
# nn.Dropout2d(dropout),
# nn.Upsample(scale_factor=2),
# ResBlock(1, nfs[5], nfs[6], act_cls=Mish), #nn.Conv2d(16, 1, 3),
inp= torch.randn((32,3,160,160))
#encoder[:1](inp).shape
for ii in range(0,10):
print(enc[:ii](inp).shape)
z = encoder(inp)
for ii in range(0,14):
print(decoder[:ii](z).shape)
class UnFlatten(Module):
def __init__(self, size=7):
self.size = size
def forward(self, input):
return input.view(input.size(0), -1, self.size, self.size)
class MMD_VAE(Module):
def __init__(self, latent_size):
self.encoder = nn.Sequential(
ResBlock(1, 1, 5, stride=2, act_cls=Mish),
ResBlock(1, 5, 5, stride=2, act_cls=Mish),
ResBlock(1, 5, 1, stride=2, act_cls=Mish),
Flatten(),
nn.Linear(16, latent_size)
)
dropout=0
self.decoder = nn.Sequential(
nn.Linear(latent_size, 16),
UnFlatten(4),
ResBlock(1, 1, 4, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, 4, 8, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, 8, 16, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
nn.Conv2d(16, 1, 3),
nn.Dropout2d(dropout),
nn.AdaptiveAvgPool2d((28, 28)),
nn.Sigmoid()
)
def forward(self, X):
latent = self.encoder(X)
return self.decoder(latent), latent
#decoder
n_blocks = 5
nfs = [3] + [2**i*n_blocks for i in range(n_blocks+1)]
nfs.reverse()
n = len(nfs)
tst = nn.Sequential(
nn.Linear(latent_dim,hidden_dim, #nn.Linear(latent_dim, 16)
nn.Linear(hidden_dim,im_size*n_blocks*n_blocks) #nn.Linear(latent_dim, 16)
UnFlatten(n_blocks), #4
ResBlock(1, nfs[0], nfs[1], act_cls=Mish), #ResBlock(1, 1, 4, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[1], nfs[2], act_cls=Mish), #RResBlock(1, 4, 8, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[2], nfs[3], act_cls=Mish), #ResBlock(1, 8, 16, act_cls=Mish),
nn.Dropout2d(dropout),
nn.Upsample(scale_factor=2),
ResBlock(1, nfs[3], nfs[4], act_cls=Mish), #nn.Conv2d(16, 1, 3),
nn.Dropout2d(dropout),
nn.AdaptiveAvgPool2d((3,im_size, im_size)),
SigmoidRange(*out_range)#nn.Sigmoid()
*modules,
ConvLayer(nfs[-2],nfs[-1],
ks=1,padding=0, norm_type=None, #act_cls=nn.Sigmoid) )
act_cls=partial(SigmoidRange, *out_range)))
lr1,lr2=learn.lr_find()
mlr = .5*(lr1+lr2)
#geometric mean
gmlr = torch.tensor([lr1,lr2]).log().mean().exp().tolist()
lr1,lr2,mlr,gmlr
n_epoch = 10
#learn.fit_flat_cos(n_epoch) #, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch, lr=lr1, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch) #, lr_max= base_lr)
learn.show_results()
n_epoch = 40
#learn.unfreeze()
#learn.fit_flat_cos(n_epoch, lr=1e-3, div_final=1e6, pct_start=0.2)
learn.fit_flat_cos(n_epoch, lr=lr1, div_final=1e6, pct_start=0.05)
#learn.fit_flat_cos(n_epoch, lr=1e-3, div_final=1e5, pct_start=0.5)
#learn.fit_one_cycle(n_epoch) #, lr_max= base_lr)
learn.show_results()
prefix = f"MMDVae-{'bmean' if batchmean else 'mean'}{'l1' if useL1 else 'l2'}"
filename = f"frozen{prefix}-{learn.model.name}-alpha{alpha:d}_{datetime.now().strftime('%Y-%m-%d_%H.%M.%S')}"
learn.save(filename)
learn.export(f'{filename}.pkl')
x=1
x
def create_encoder(nfs,ks,conv=nn.Conv2d,bn=nn.BatchNorm2d,act_fn = nn.ReLU):
"""
constructor for generic convolutional encoder
"""
n = len(nfs)
conv_layers = [nn.Sequential(ConvBnRelu(nfs[i],nfs[i+1],kernel_size=ks[i],
conv = conv,bn=bn,act_fn=act_fn, padding = ks[i] //2 ),
Downsample(channels=nfs[i+1],filt_size=3,stride=2))
for i in range(n-1)]
convs = nn.Sequential(*conv_layers)
return convs
def create_encoder_denseblock(n_dense,c_start):
"""
constructor for resnet with dense blocks (?)
n_dense": 3,
"c_start": 4
"""
first_layer = nn.Sequential(ConvBnRelu(3,c_start,kernel_size=3,padding = 1),
ResBlock(c_start),
Downsample(channels=4,filt_size=3,stride=2))
layers = [first_layer] + [
nn.Sequential(
DenseBlock(c_start * (2**c)),
Downsample(channels=c_start * (2**(c+1)),filt_size=3,stride=2)) for c in range(n_dense)
]
model = nn.Sequential(*layers)
return model
def create_decoder(nfs, ks, size, conv=nn.Conv2d, bn=nn.BatchNorm2d, act_fn=nn.ReLU):
"""
CURR VALUES:
"nfs":[66,3*32,3*16,3*8,3*4,3*2,3,1,3],
"ks": [ 3, 1, 3,1,3,1,3,1],
"size": IMG_SIZE
"""
n = len(nfs)
# We add two channels to the first layer to include x and y channels
first_layer = ConvBnRelu(nfs[0], #input size
nfs[1], # output size
conv = PointwiseConv,
bn=bn,
act_fn=act_fn)
conv_layers = [first_layer] + [ConvBnRelu(nfs[i],nfs[i+1],kernel_size=ks[i-1],
padding = ks[i-1] // 2,conv = conv,bn=bn,act_fn=act_fn)
for i in range(1,n - 1)]
dec_convs = nn.Sequential(*conv_layers)
dec = nn.Sequential(SpatialDecoder2D(size),dec_convs)
#SigmoidRange(*y_range)
return dec
def decoder_simple(y_range=OUT_RANGE, n_out=3):
return nn.Sequential(#UpsampleBlock(64),
UpsampleBlock(32),
nn.Conv2d(16, n_out, 1),
SigmoidRange(*y_range)
)